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LOCAL  AND  SUPERLINEAR  CONVERGENCE 
OF  STRUCTURED  SECANT  METHODS 
FROM  THE  CONVEX  CLASS 


Abstract.  In  this  paper  we  develop  a  unified  theory  for  establishing  the  local 
and  <7 -superlinear  convergence  of  the  secant  methods  from  the  convex  class  that 
take  advantage  of  the  structure  present  in  the  Hessian  in  constructing  approxi¬ 
mate  Hessians. 

As  an  application  of  this  theory,  we  show  the  local  and  <7 -superlinear  con¬ 
vergence  of  any  structured  secant  method  from  the  convex  class  for  the  con¬ 
strained  optimization  problem  and  the  nonlinear  least-squares  problem.  Particu¬ 
lar  cases  of  these  methods  are  the  SQP  augmented  scale  BFGS  and  DFP  secant 
methods  for  constrained  optimization  problems  introduced  by  Tapia.  Another 
particular  case,  for  which  local  and  <7 -superlinear  convergence  is  proved  for  the 
first  time  here,  is  the  Al-Baali  and  Fletcher  modification  of  the  structured  BFGS 
secant  method  considered  by  Dennis,  Gay  and  Welsch  for  the  nonlinear  least- 
squares  problem  and  implemented  in  the  current  version  of  the  NL2SOL  code. 


Key  words,  secant,  quasi-Newton,  least-squares,  superlinear  convergence, 
bounded  deterioration,  constrained  optimization. 


1980  Mathematics  Subject  Classification:  Primary  49D15,  G5K05. 
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1.  Introduction. 

Our  goal  is  to  develop  a  unified  theory  which  can  be  used  to  establish  the 
local  and  g-superlinear  convergence  of  the  secant  methods  from  the  convex  class 
studied  by  Broyden  [1967]  and  Fletcher  [1970]  that  take  advantage  of  the 
structure  present  in  the  Hessian  in  constructing  approximate  Hessians. 

The  theory  we  will  give  can  be  seen  either  as  a  generalization  of  the  result 
for  the  structured  DFP  secant  method  given  by  Dennis  and  Walker  [1981]  to  any 
structured  secant  method  in  the  convex  class  or  as  an  extension  of  the  results  for 
the  (unstructured)  secant  methods  from  the  convex  class  obtained  by  Griewank 
and  Toint  [1982]  to  the  structured  secant  methods  in  the  same  class.  Indeed,  our 
approach  is  similar  to  the  one  used  in  both  of  these  papers. 

As  a  surprising  consequence  of  our  careful  computation  of  the  constants  in 
the  bounded  deterioration  principle,  we  obtain  a  stronger  bounded  deterioration 
inequality  for  the  BFGS  secant  method. 


(1.1) 


1.1.  The  Secant  Method. 

By  a  secant  method  for  the  optimization  problem 

minimize  f  (x) 

X 

where  /  :  IRn  — >-IR,  we  mean  the  iterative  procedure 

X+  —  X  +  s 

B+  =  IB(x,  s,  y,B ). 

Here  y  is  an  approximation  to  V2/  (x+)s,  s  is  the  quasi-Newton  step  defined  by 

Bs  =  —  V/  (x)  ,  (1.3) 

and  B+  is  required  to  satisfy  the  secant  equation 


(1.2) 
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B+s  =  y  .  (1.4) 

One  way  of  defining  y,  and  the  most  often  used,  is 

y  =  Vf(x+)  -  Vf(x)  .  (1.5) 

A  large  class  of  this  type  of  methods  has  been  studied  by  Broyden  [1967], 
Fletcher  [1970],  Greenstadt  [1970],  Huang  [1970],  Dennis  [1972],  Schnabel  [1977], 
and  numerous  other  authors. 


1.1.1.  The  Broyden  Class  of  Secant  Updates. 

We  will  call  the  set  of  "exact",  "stable",  and  symmetric  rank-2  secant 
updates  suggested  by  Broyden  [1967]  the  Broyden  class  of  secant  updates.  In  the 
literature,  this  class  is  also  referred  to  as  the  Broyden  /3-class  of  secant  updates 
because,  initially,  it  was  parametrized  by  a  real  scalar  /?.  Fletcher  [1970]  shows 
that  this  class  of  secant  updates  can  be  written  as 


B+  =  B  +  Aj(s  ,y  ,B  ,(f>)  (1.6) 

where  the  parameter  4>  £  IR,  and  the  update  correction  A x(s ,y  ,B  ,<f>)  is  given  by 


A  i(s,y,B,<f>)  = 


yy 


T 

y  s 


Bss  T  B 
sT  Bs 


+  <psT  Bs  uuT 


(1.7a) 


u 


y  Bs 
yT  s  sT  Bs 


The  following  are  well-known  choices  of  the  parameter  (f>: 
Convex  Class  0E[O,l] 


DFP  <f>  =  1 


(1.7b) 


(1.8a) 

(1.8b) 


BFGS 


(j)  =  0 


(1.8c) 


3 


SRI 


<j)  = 


T 

y  s 


yT  s  —  s  T  Bs 


(1.8d) 


Another  important  class  of  secant  updates,  suggested  by  Greenstadt  [1970]  is 
the  set  of  all  the  symmetric  secant  updates  which  minimize  a  weighted  Frobenius 
norm  of  B+—B  (see  Dennis  and  Walker  [1981]  for  more  details).  Dennis  [1972] 
derived  a  larger  class  of  symmetric  secant  updates  as  the  limit  of  an  iterative 
process  and  showed  that  this  larger  class  can  be  written  as 


B+  =  B  +  A 2(s,  y,  B,  v) 


(1.9) 


where  the  vector  v  ElR"  is  called  the  scale  (see  Dennis  and  Walker  [1981]),  and 
the  update  correction  A2  (s,  y,  B,  v)  is  given  by 


A2  (s,y,B,v) 


(y  —  Bs)  vT  +  v(y  —  Bs)t  ( y—Bs)Ts  f 

vT  s  (vT  s)2 


(1.10) 


The  scale  v  is  often  a  function  of  s ,  y ,  and  B ,  as  is  the  case  for  the  following 
well-known  members  of  this  class: 


PSB  v  =  s 

(1.11a) 

DFP  v  =  y 

(i.nb) 

BFGS  v  =  y  +[—^-1 1/2  Bs 
s1  Bs 


(1.11c) 


SR  1  v  =  y  —  Bs.  (l.lld) 

Dennis  [1972]  also  pointed  out  that  a  member  in  his  class  was  a  member  of 
the  Broyden  class  only  if  the  scale  v  is  a  linear  combination  of  y  and  Bs. 
Schnabel  [1977]  proved  that  there  exists  an  onto  mapping  from 

{A  2(s,y  ,B,v):  v  =  y  +  o{y  —Bs ),  ctGIR,  a  yTs/(sTBs  — yTs )}  (1.12a) 
to 
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{  ^(s  ,y  ,B  ,<f>):  0  € 1R:  (1  -  <f>)  y  S  +  <f>  >  0  }  .  (1.12b) 

s1  Bs 

Three  remarks  are  important  here.  First  of  all,  set  (1.12b)  is  the  set  of 
rank-2  matrices  that  has  the  form  given  by  (1.7)  and  can  be  written  as 
ww T  —  zz T  for  some  nonzero  vectors  w ,  z  £  1R” .  Secondly,  if  B  is  positive 
definite  and  yTs  >0,  set  (1.12b)  contains  the  convex  class  of  secant  updates. 
Finally,  this  mapping  will  be  crucial  to  extend  the  bounded  deterioration 
principle  for  the  (unstructured)  secant  methods  to  the  corresponding  structured 
ones  (see  Theorem  2.5). 

1.1.2.  The  Structured  Secant  Method. 

Often,  in  practice,  a  part  of  V2/  ( x )  is  available  and  we  need  only  to 
approximate  the  remaining  part.  Suppose  that 

V2/ (x)  =  C(x) +  *S(a:)  (1.13) 

where  C:  JR"  — ►IRnXn  is  the  available  part  of  V2/.  In  several  important 
applications,  e.g.  nonlinear  least-squares,  C(x)  is  composed  of  first-order 
information  and  S(x)  requires  second-order  information. 

By  a  structured  approximation  of  V2/  (x)  we  mean  an  approximation  of  the 

form 

B=A+C(x)  (1.14) 

where  A  is  an  approximation  to  S(x).  Moreover,  if  B  is  updated  according  to 
the  formula  B+  =  A  +  +  C  (x+)  where 

A+  =  A  +  A2  (s,  y*,A,  v),  (1.15) 

v  =  v(s,  y,  B),  and  y#  and  y  are  approximations  to  S(x+)s  and  V2/  (x+)s 
respectively,  we  call  B+  a  structured  secant  approximation  of  V2/(x+).  Observe 
that  the  structured  update  (1.15)  satisfies  the  secant  equation 
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A+s  =  y&  .  (1.16) 

We  obtain  a  structured  secant  method  for  problem  (l.l)  if  we  use 
B+  =A+  +  C(x+)  instead  of  B+  =JB(x,  s ,  y ,  B)  in  (1.2),  where  A+  is  given  by 
(1.15). 

Historically,  a  primary  example  of  the  use  of  structure  has  been  the 
nonlinear  least-squares  problem,  e.g.,  Brown  and  Dennis  [1971],  Dennis  [1973], 
Betts  [1976],  Dennis  [1976],  Dennis  [1977],  Bartholomew-Biggs  [1977],  Dennis 
[1978],  Dennis  and  Welsch  [1978],  Gill  and  Murray  [1978],  Dennis,  Gay  and 
Welsch  [1978],  Dennis  and  Walker  [1981],  Dennis  and  Schnabel  [1983],  Al-Baali 
and  Fletcher  [1985],  Xu  [1986],  Fletcher  and  Xu  [1987],  and  Toint  [1987]  (see 
Martinez  [1988],  Section  4. 1.1. 2,  for  more  details  about  these  works). 

Initially,  the  structure  was  not  carried  into  the  calculation  of  the  scale.  It 
was  Al-Baali  and  Fletcher  [1985]  who  first  suggested  using  structure  also  in  the 
scale  v.  Independently,  Tapia  [1984]  employed  a  structured  scale  in  his  work  on 
structured  updates  for  constrained  optimization  problems. 

Dennis  and  Walker  [1981]  developed  a  convergence  theory  that  includes  the 
structured  PSB  and  DFP  secant  methods.  It  also  includes  the 
inverse— structured  BFGS  secant  method,  i.e.,  the  case  when  V2/ (x)-1  instead 
of  V2/  (x)  is  assumed  to  be  of  the  form  given  by  (1.13).  As  an  application  of 
this  theory,  the  local  and  <?-superlinear  convergence  for  the  structured  PSB  and 
DFP  secant  methods  for  the  nonlinear  least-squares  problem  was  established  (see 
Chapter  10  of  Dennis  and  Schnabel  [1983]). 

Xu  [1986]  (see  Fletcher  and  [1987])  showed  that  the  global  and  local 
properties  proved  by  Powell  [1976]  for  the  BFGS  secant  method  with  an  inexact 
linesearch  carries  over  for  the  partial-structured  BFGS  secant  method  for 
nonlinear  least  squares  problems  suggested  by  Al-Baali  and  Fletcher  [1985]. 

Another  important  application  of  structured  secant  methods  was  given  by 
Tapia  [1984].  He  used  the  well-known  bounded  deterioration  of  the  DFP  and  the 
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inverse  form  of  the  BFGS  secant  updates  as  a  basis  for  establishing  bounded 
deterioration  of  the  structured  DFP  and  the  inverse  of  the  structured  BFGS 
secant  updates.  Then  he  proved  local  and  ^-superlinear  convergence  for  the 
structured  DFP  and  BFGS  secant  version  of  his  algorithms  for  equality 
constrained  optimization  problems.  We  will  give  more  details  about  these 
algorithms  in  Section  4.2.1. 

1.2  Standard  Assumptions. 

In  our  analysis,  we  will  use  several  different  matrix  norms.  The  Frobenius 
norm  will  be  denoted  by  ||  •  ||^,  the  Frobenius  norm  weighted  by  V2/  (x*)  will 
be  denoted  by  |  •  J*,  i.e.  ||  ||*  =  ||  V2/ (ar*)-1/2(  •  )V2/  (a;*)-1/2  ||F,  and  the  /2 

operator  norm  will  be  denoted  by  |  ■  | .  The  only  vector  norm  that  will  be  used 
is  the  /2  or  Euclidean  norm,  and  it  will  be  denoted  by  jj  •  ||. 

The  standard  assumptions  for  problem  (1.1)  are: 

Al:  Problem  (l.l)  has  a  solution  x *. 

A2:  The  function  /  £C2,  and  V2/  and  C  (see  (1.13))  are  locally  Lipschitz 
continuous  at  x *,  i.e.,  there  exist  positive  constants  L,  Lc  and  Cj  such  that 

|vVW-vV(*.)l<i|*-*.|  (i.i7) 

and 

IC(x)-C(i,)I  <LC  \x-x,  |  (1.18) 

for  x  £T1  =  {a::  || a:  —  x*  |  <  Cj}  . 

A3:  The  matrix  V2/  ( x *)  is  positive  definite,  i.e.,  there  exist  positive  constants 
m  and  M  such  that 

m  ||2  <  zT'V2f  (a:*)  z  <  M  ||^  ||2 


for  all  ^  ElR” . 


(1.19) 
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1.3.  Local  Convergence  for  Secant  Methods. 

The  technique  used  for  proving  local  convergence  for  secant  methods  for 
problem  (1.1)  is  generally  based  on  the  bounded  deterioration  principle 
introduced  by  Dennis  [1971]  and  popularized  by  Broyden,  Dennis  and  More' 
[1973].  Indeed,  Dennis  [1971]  introduced  the  bounded  deterioration  principle  as  a 
majorization  technique  for  analyzing  the  class  of  "Newton-like"  methods  which 
includes  the  secant  methods. 

Initially  this  principle  was  stated  in  terms  of  the  approximations  to  the 
Hessian  and  expressed  the  fact  that,  while  the  sequence  {Bk }  of  approximations 
to  the  Hessian  need  not  converge  to  V2/  (a:*),  it  should  deteriorate  only  in  a 
controlled  way.  In  mathematical  terms,  we  can  express  this  principle  as  follows: 
there  exist  non-negative  constants  oq,  a2  such  that  for  x  ENX  and  B  E.N2,  B+ 
satisfies 

II B+—B*  <  [l+aqcr(a;,a:+)]  || B  —B*  ||*  +  a2<j{x,  x+)  (1.20) 

where  5*  =  V  2/  (x* ),  Nk  and  N2  are  neighborhoods  of  x *  and  B*  respectively, 
and  cr^!,  x2)  =max  {  Jarj  —  a:*  ||,  \\x2  —  x*  ||}.  Here  B+  stands  for  Bk+l  and  B  for 
Bk- 

Broyden,  Dennis  and  More'  [1973]  used  this  principle  of  bounded 
deterioration  as  a  sufficient  condition  for  local  convergence  of  the  secant  methods. 
As  an  application  of  their  theory,  they  showed  the  local  convergence  of  the  DFP 
secant  methods. 

Since  it  was  considered  to  be  more  convenient  to  work  with  approximations 
to  V2f{x*)~l  instead  of  approximations  to  V2/ (x*),  they  also  stated  the 
principle  of  bounded  deterioration  in  terms  of  the  approximations  to  the  inverse 
of  the  Hessian.  In  mathematical  terms  it  is  expressed  in  the  following  way.  There 
exist  non-negative  constants  aq,  a2  such  that  for  xGIVj  and  B~l(zN2,  B+1 
satisfies 
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IIR+1  —  I*  <  [l+a1cr(a:,x+)]  |5-1  —  Brl  j| *  +  a2(r(x ,  x+)  (1.21) 

where  and  7V2  are  neighborhoods  of  x *  and  BiT 1  respectively. 

This  inverse  form  of  the  bounded  deterioration  inequality  allowed  them  to 
prove  that  the  BFGS  secant  method  was  locally  convergent. 

Based  on  the  Broyden-Dennis-More  theory,  Dennis  and  Walker  [1981] 
developed  a  general  local  convergence  theory  for  structured  secant  methods  which 
includes  the  inverse —structured  BFGS  and  the  structured  DFP  secant  methods. 
Clearly,  while  the  structured  DFP  proof  uses  the  direct  form  of  bounded 
deterioration  inequality  (1.20),  the  inverse-structured  BFGS  proof  uses  the 
inverse  form  of  bounded  deterioration  inequality  (1.21). 

Ritter  [1979]  extended  the  Broyden-Dennis-More  result  to  a  subclass  of 
Broyden’s  secant  methods,  the  subclass  of  positive  definite  secant  updates.  His 
convergence  results  use 

ip  =  trace  (B*1/2B~1B*/2  +  B*~l/2B  Br1'2)  (1 .22) 

as  the  measure  of  a  good  approximation  to  B *  instead  of  the  weighted  Frobenius 
norm  of  B  —  V2/  ( x *)  used  in  the  bounded  deterioration  principle  (1.20).  Indeed, 
the  local  convergence  proof  follows  from  a  principle  of  bounded  deterioration  in 
terms  of  ip,  i.e., 

ip+  <  ip  +  6o(x,  x+)  6>  0  (1.23) 

(see  expressions  (3.23)  and  (3.24)  in  Ritter  [1979]). 

Independently,  Stachurski  [1981]  also  extended  the  same  result  to  Broyden’s 
bounded  <^-class  of  secant  methods,  which  allows  the  parameter  (p  to  change  at 
each  iteration.  His  approach  is  a  generalization  of  the  Broyden-Dennis-More  proof 
for  the  BFGS  secant  method.  In  fact,  the  inverse  bounded  deterioration 
inequality  (1.21)  appears  implicitly  in  his  proof.  He  also  proved  that  the  Broyden 
bounded  <^>-class  of  secant  updates  includes  the  subclass  considered  by  Ritter 
[1979].  An  interesting  fact  about  Stachurski’s  results  is  that  his  estimate  for  the 
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radius  of  convergence  decreases  as  the  absolute  value  of  <fr,  the  parameter  of  the 
secant  update  formula  (1.7),  increases. 

It  was  Griewank  and  Toint  [1982]  who  first  gave  a  unified  direct  bounded 
deterioration  principle  for  all  the  members  in  the  convex  class  (1.7  with  1.8a). 
They  also  showed  that  the  inverse  form  of  these  secant  updates  satisfies  the 
inverse  form  of  the  bounded  deterioration  inequality  (1.21).  In  the  same  paper, 
they  gave  sufficient  conditions  for  a  member  of  this  subclass  of  secant  methods  to 
have  a  g-superlinear  rate  of  convergence.  However,  mainly  due  to  their  non- 
restrictive  assumptions  and  their  big  O  notation,  it  was  not  obvious  how  to 
extend  this  result  to  the  structured  secant  methods  described  in  Section  1.1.2.  It 
also  was  not  clear  how  to  obtain  the  direct  form  of  the  bounded  deterioration 
principle  for  the  structured  secant  methods,  except  DFP,  from  other  approaches 
in  the  literature. 

1.4.  Material  to  Follow. 

In  this  paper  we  will  consider  only  the  structured  secant  methods  from  the 
convex  class.  However,  all  our  results  are  valid  for  some  negative  values  of  the 
parameter  (f>  and  for  some  values  of  it  greater  than  one  as  well. 

In  Section  2  we  prove  that  the  structured  secant  approximations  to  the 
Hessian  defined  in  Section  1.1.2  satisfy  the  bounded  deterioration  inequality 
(1.20)  for  0£[O,  l].  Moreover,  we  prove  that  a  surprising  and  stronger  form  of 
this  bounded  deterioration  is  valid  for  the  structured  BFGS  secant  method. 

In  Section  3  we  establish  the  local  and  g-superlinear  convergence  for  all  of 
the  structured  secant  methods  in  the  convex  class  using  the  Broyden,  Dennis  and 
More  [1973]  and  Griewank  and  Toint  [1982]  theories. 

Finally,  in  Section  4  we  use  this  theory  to  prove  the  local  and  g-superlinear 
convergence  of  any  structured  secant  method  from  the  convex  class  for  the 
constrained  optimization  problem  and  the  nonlinear  least-squares  problem. 
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Particular  cases  of  these  methods  are  the  SQP  augmented  scale  BFGS  and  DFP 
secant  methods  for  constrained  optimization  problems  introduced  by  Tapia 
[1984].  Another  particular  case,  for  which  local  and  ^-superlinear  convergence  is 
proved  for  the  first  time  here,  is  the  Al-Baali  and  Fletcher  [1985]  modification  of 
the  structured  BFGS  secant  method  considered  by  Dennis,  Gay  and  Welsch 
[1981]  for  the  nonlinear  least-squares  problem  and  implemented  in  the  current 
version  of  the  NL2SOL  code. 


2.  Bounded  Deterioration. 

Our  objective  in  this  section  is  to  demonstrate  that  the  structured  secant 
approximations  to  the  Hessian  from  the  convex  class  satisfy  the  direct  form  of 
the  bounded  deterioration  principle,  i.e.,  for  x  sufficiently  close  to  x *,  these 
approximations  satisfy 

||B+  —  V2/  (x*)  H*  <  [l+a1o(a:,a;+)]  \\B  —  V2/  (x*)  ||*  +a2a(x,  x+)  (2.1) 

where  aq  and  a2  are  positive  constants  and  cr(xly  x2)  — 
max{  |1 37 j  x*  A,  \x2  —  x*  ||}.  Moreover,  we  will  show  that  the  structured  BFGS 
secant  approximations  satisfy  a  surprising  and  stronger  form  of  bounded 
deterioration.  Specifically  they  satisfy  inequality  (2.1)  with  aq  =  0. 

This  bounded  deterioration  inequality  will  allow  us  to  use  the  Broyden- 
Dennis-More  theory  to  establish  that  under  the  standard  assumptions  the 
sequence  {a^}  generated  by  a  structured  secant  method  from  the  convex  class  is 
^-linearly  convergent  to  x *.  The  g-superlinear  convergence  will  then  follow  from 
Proposition  4  of  Griewank  and  Toint  [1982].  This  proposition  is  based  on  the 
well-known  Dennis-More  characterization  (see  Proposition  3.1). 
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2.1.  Important  Bounds. 

The  bounds  needed  to  prove  inequality  (2.1)  when  the  structure  in  the 
Hessian  is  not  used,  follow  from  Assumption  A3  and  the  fact  that  y  is  a  "good" 
approximation  to  V2/  (x*)s.  We  formalize  this  fact  in  the  following  proposition. 

PROPOSITION  2.1.  Suppose  that  Standard  Assumption  AS  holds  and  let  D  be 


a  neighborhood  of  x*.  For  xx,x2ED  define  s=x2-xl 
approximation  to  V2/  (x*)s.  If  there  exists  K1  >  0  such  that 

and  let  y  be  an 

lb  -  V2/  (xt)s  8  <  K1a(xl,x2)  ||s  | 

(2.2) 

for  all  xl,  x2  ED ,  then  the  following  inequalities  hold: 

8l/  II  <  {M  +Klcr(xl,  x2))  \s  || 

(2.3a) 

yTs  <  (M  +Kla(xl,  x2))  ||s  B2 

(2.3b) 

where  M  is  given  in  Assumption  AS  (see  (1.19)).  Moreover, 
constants  e2,  and  (3  such  that  the  following  inequalities  hold: 

there  exist  positive 

yT s  >/3  |s  f2 

(2.4a) 

b\hl  <*)+*) 
y  s  P  P 

(2.4b) 

for  xv  x2£D2  =  {x:  —  x*  \<e2}C.D . 

Proof.  Let  z=y—V2f(x*)s  and  xx,x2ED.  Then  (2.3)  follows  directly 
from  inequality  (2.2)  and  Assumption  A3  (see  (1.19)).  To  define  D2,  choose  e2  so 
that  Kx  e2  <  m  and  D  2  (ZD ,  where  m  is  given  in  Assumption  A3.  If  Xy,  x2ED  2, 
(2.4a)  follows  from  Assumption  A3  with  j3  =  m  —Kxe2.  Finally,  notice  that  for 

s  ^0 

h  II  U  I  h  II  J«_|i  . 

yTs  8s  II  yTs 

so  that  (2.4b)  follows  from  inequalities  (2.3a)  and  (2.4a).  • 
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Similarly,  when  the  structure  in  the  Hessian  is  used,  the  bounds  needed  to 
establish  bounded  deterioration  (2.1)  follow  from  Assumptions  A2  and  A3,  and 
the  fact  that  y &  is  a  "good"  approximation  to  S(x*)s,  where  S  is  given  in  (1.13). 
We  formulate  this  fact  in  the  next  proposition. 

PROPOSITION  2.2.  Suppose  that  Standard  Assumption  A2  holds  and  let  D  be 
a  neighborhood  of  x* .  For  xlfx2ED  define  s=x2  —  xx  and  let  y &  be  an 
approximation  to  S(x*)s .  If  there  exists  K2  >  0  such  that 

lll/#  -S(x*)s  U  <  K2a(x1,x2)  ||s  I  (2.5) 

for  all  xl,x2£D,  then  there  exists  K3>  0  such  that  y  — y #  +C'(ir)s  for  any 
x  E  [a^,  x2]  satisfies 

ill/  -  V2/  (x*)s  B  <  Ks<r(xlt  xf)  || s  I  (2.6) 

for  all  xx,  x2ED1C[D  where  Dx  is  given  in  Assumption  A2. 

Proof.  Let  xv  x2(~Dl  C\D.  Taking  advantage  of  the  structure  in  y  and  in 
the  Hessian,  we  can  write 

b  -V2/  (x*)s  ||  <  b#  -S(x*)s  |  +  |[C(*)-C7(**)]«  ! 

<K2a(xvx2)ls  ||  +  Lc  \F-x*  ||  | s  | 

<  (/•C2  +  Lc)o-(xi,x2)  ||s  ||  .  • 


2.2.  Basic  Lemma 

The  next  lemma  is  very  useful  when  dealing  with  weighted  Frobenius  norms. 
Particular  cases  of  it  were  established  by  Powell  [1978]  and  by  Griewank  and 
Toint  [1982]. 

LEMMA  2.3.  Consider  a  symmetric  matrix  B  ElRnXn,  vectors  u,z,w  EIR", 
and  scalars  a,(f>£lR.  Suppose  that 
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If  we  define 


then 


where 


a  =  uT  z,  w  =  au  —  2,  uT  Bz  =  zT  z , 
uT  u  —  1  and  u  T  Bu  =  (uT z)2  . 

B'  —  B  -f  uu  T  —  zz  T  +  (j>wwT , 

\\B'-I\2f  =  ||5-/||2f-{p  +2  qfi-rfi2} 

p  =  (1  —  zT  z)2  +2[ztBz  —  (zTz)2] 
q  =  (zT  zf  +  zT  z  — ( a 2  +  zT  Bz) 
r  =  (zT  w)2 . 

Moreover,  if  B  is  symmetric  and  positive  definite,  u  =  ■■■■ v  ■■■  and  z  — 

II 

for  some  nonzero  vector  v  ElRra,  then 

p  ,  r  >  0  and  p  +  2q  —  r  >0 

which  imply 

lB'  —  I  |f  <  ||B  — /  If  for  [0,1], 


(2.7a) 

(2.7b) 

(2.8) 

(2.9) 


(2.10) 


Bv 

"\/  v  T  Bv 


(2.11a) 


(2.11b) 


Proof.  To  prove  (2.9),  observe  that  using  definition  (2.8)  we  can  write 
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(B'-I)T(B'-I)  =  (B-I)T(B-I)  +  (B-I)  uut  +  uuT (B  —  I) 

~(B  —I)zzT  —  zzT  (B  —  I)  +  (uT  u)  uuT 
+  (zT  z)  zzT  —  (uTz)uzT  —{zTu)zuT 
+  (f>[(B  —  I)ww  T  +  wtv  T  (B  —  I)  +  (uT  tv  )uw  T 
+  (tv  T  u)wu  T  —  (z  T  w)zw  T  —  (w  T  z)wzT] 

+  <f>2(w  T  w)ww  T  . 

Therefore,  using  trace  (A  +  B)  =  trace  (A)  +  trace  (B),  trace  (xy  =xTy,  and 
|  A  || 2f  —  trace  (A  T A )  we  can  write 

trace  (B'-I)T(B'-I)  =  trace  (B -I)T(B -I)  +  2  uT  (B  - 1)  u 

—  2zt(B  —  I)z  +  (u  T u)2  +  (zT z)2  —  2(u  T z)2 

+  2  4>[wt(B-I)w  +(utw)2-(ztw)2} 

+  <j)2(w  T  w)2 

=  trace  (B  —  I)T (B  —  I)  —  {p  +2 q(f)  —  r<f>2} 

where 

P  =  2zt  (B  - 1)  z -2ut  (B  - 1)  u-(uT  u)2  ~(zT  z)2  +2(uT  z)2 
q  =  (zT  w)2  —  wt(B  —I)w  —  (uT  w)2 
r  =  (w  T  tv)2. 

Finally,  using  (2.7),  these  expressions  can  be  reduced  to  the  ones  given  by  (2.10). 

To  demonstrate  (2.11a),  notice  that  the  given  u  and  z  satisfy  (2.7)  for  any 
vector  v  =£0.  Therefore,  notice  that  from  (2.10)  r  >  0  is  obviously  true,  p  >0 
will  be  true  if  zT Bz  —(zTz)2> 0,  and  since 

p  +2  q  —r  =  (1  —  a2)2  +  2a2(zT  z  —  ex2), 

p  +2 q  —r  >0  will  be  true  if  2  T  z  —  a2  >  0. 
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Using  the  definition  of  u ,  z  and  a  we  can  write 
Jd.  _  vtB3v  [ vtB2v  ^ 


z1  Bz  —{z1  z) 


vT  Bv 


v  T  Bv 


__  vTB3v  v T Bv  —  [v T  Bv]2 
[vtBv]2 


and 


/  x  2\  vT  B“v  vT Bv  vT B^v  v Tv  —  [v  rBt>]2 

v T  Bv  v T  v  vT  Bv  v T  v 

We  will  now  show  that  the  numerators  of  these  expressions  are  positive.  From 
the  Cauchy-Schwarz  inequality  we  have 

vtB3v  vtBv  =  | B3>2v  ||2  | p/2v  ||2  =  [  t B3<\  1  \\B1/2v  I  ]2 

>  [(B^vfiB^v)}2  =  [vtB2v}2, 

and 

vtb2 v  vTv~  |b<.1’M2-[|a>IM]’ 

>  \{Bv)T v]’2  =  [vtBv]2  .  • 


2.3.  Bounded  Deterioration  for  the  Secant  Approximations. 

Now  we  establish  the  bounded  deterioration  inequality  for  the 
(unstructured)  secant  approximations  from  the  convex  class.  The  proof  is  based 
on  the  approach  used  by  Griewank  and  Toint  [1982].  However,  our  result  is 
stronger  than  the  specialization  to  the  BFGS  of  their  result  (we  obtain  a  sharper 
bounded  deterioration  inequality).  Moreover,  in  order  to  fully  expose  the  ideas 
involved,  we  will  not  assume  that  the  problem  has  been  transformed  so  that  the 
Hessian  at  x*  is  the  identity  matrix. 

THEOREM  2.4.  Suppose  that  Standard  Assumption  AS  holds.  Let  B+  be  an 
(unstructured)  secant  update  from  the  convex  class,  i.e. 

B+  =  B  + 


(2.12) 
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where  s  —  x^—x,  Ax(s  ,y  ,B  ,<f>)  is  given  by  (1.7)  with  the  parameter  <^E[0,  l],  and 
y  is  an  approximation  to  V2f(x*)s.  If  there  exist  D,  a  neighborhood  of  x*,  and 
Kx,  a  positive  constant,  such  that 

h  ~  V2f  (x*)s  1  <  K1cr(x,x+ )  ||s  || ,  (2.13) 

for  x ,x+£D,  then  the  bounded  deterioration  inequality  (2.1)  holds  whenever 
x ,  x+  (E  D 2,  where  Do  is  given  in  Proposition  2.1. 

Proof.  Let  B *  =  V2/(a:*)  and  x,x+&D2,  and  define 

B'-B  +Al(s,B*s,B,<l>)  .  (2.14) 

The  idea  of  the  proof  is  to  determine  bounds  on  JB+  —  B'  ||  *  and  II B'-B,  I* 
in  terms  of  —  B *  |#  and  then  use  the  triangle  inequality  to  obtain  the 

bounded  deterioration  inequality  (2.1). 

The  bound  on  IB'-B.  |,  follows  from  (2.11)  in  Lemma  2.3.  If 
B  —B  l!2B'B  _1/2,  B  —  B*~l/2BB  *~xI2  and  v  —  B*l/2s  we  can  write 

II B'-B,  I*  =  =  \b'-i  |jp 

=  \B *~1//2[  B-B*  +Al(s,B*s,B,<f>) }  B *~1/2  ||  p 
=  || B*~'I\B  -J5*)J3"-1/2+5*-i/2[  Ax(s,B*s,B,<f>)  ]  B*~1!2  I  F 
=  1 B  —  I  +  uuT  —  zzT  +  (f>ww  T  ||  F 

where  u,z,  and  w  are  defined,  in  terms  of  B  and  v  given  above,  by  (2.7)  in 
Lemma  2.3.  Therefore,  by  (2.11) 

U B'-B*\*<  \B-B*\*  for  0E[O,1],  (2.15) 

To  derive  a  bound  on  \B+  —  B'  ||*,  observe  that 

B+  —  B '=  Ex  +  <f>  {E2  +  E2  +E3) 


where 


(2.16) 
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E 1  = 


_  yy 


T  B*ssT  B* 


E,  =  [ 


yTs  stB*s 
Bts 


y 


stB*s  yT  s 


stB 


F  -f  yyT  B*sstB*  1 

3  wT\2  (  T  n  * 

{y  s)z  (su  B*s)2 

Adding  and  subtracting  the  appropriate  terms,  and  using  Assumption  A3, 
\\XVT  If  =  II*  II  Ik  S.  and  inequalities  (2.3)  and  (2.4)  which  follow  from 
condition  (2.13)  and  Lemma  2.1,  the  Frobenius  norm  of  these  matrices  can  be 
bounded  as  follows: 


T 

y  s 


yxs  s1  B*$ 


1  ,  ,  iv  - B*s)stB *  „ 

J  d  -  II F 


STB*S 


„  Ilf  I  h -B,s  I  Ik  I  |B.s  I  \v-B,s  I  Is  I  ,  h-B.,\\B.s\ 

~  1  m  m  "4“ 


T 

y  s 


< 


Ik  I  Ik  I  «k -b>*  I 


T 

y  s 


4- 


in 

Ik  -b,s  I  Ib.sIIsI 


yT 8  stBiS  st B,s 

I  II B.s  I  h  1  | y-Ihs  I 


T 

y  s 


s  T B*s 


ft 


+ 


FT 


s  T B*s 


^  M+Kl(r{x,x+)  M+Kla{x,x+ )  M  ,  M  ,  r,  , 

S[ - o - + - 3 - ¥- —  Kla(x,x+) 

P  P  m  m 

M  +K1e2  M+K&  M  M  i  1  , 

S  |[  - Q- - +  - n - +  -  K!  }  CriX,X  +  )  • 

P  p  mm 


=  h  a{ x,x+) 
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\\E2  It?  —  y 


(B*s  —y)sTB 
sB*s 


+  ys^  B  [  — - - —  |  I 

9  [  SB*S  yTs'  1 


<  h  —B*s  |  \\Bs  |  +  lb  y  l|£s  8  By  -B*s  |  is  y 


sB*s 


sB*s  y 1  s 


<  \B  I  I*?*,-1  M2  +  \B  |  LdM  ki! 

lb  1  sB*s  yT  s  I  II  sB*s 


<|B|  ±+M+£^fi)i 

m  p  m 

„  ..  .  M  +  K-iCn  -K-! 

<  \B  I  {  [  1  + - J - ]  —  }  o(x,x+) 


=  72 


x,x+), 


and 
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\E3  \f  <  s^Bs  +ysTB,[  1 


{yT  s)2  ( sB*s )2 


+ 


<stBs 


(yrsf 

(y  —B*s)sT  B* 

(sB*s)2  F 

112/  II  h  —B*s  1  || y  1  ||£*s  ||  ||y  -B*s  |  js  ||  ||y+j?*s  ||  || 


+ 


( yTs )2 

1  y  -B*s  J  || 


{yT sf  ( sB*sf 


( sB*s )2 

s T Bs  h~B*s  1  ||s 

¥1  " 


—  T 

y  s 


T 

y  s 


,  ,  I***  \\U  II  ||s||2  tv+B.al 

1  f  - - — - 1 - TT - 


sB*s  $B*s 


FT 


+ 


TBs 


sB*s 


FT 


sB*s 


<  \B  «[ 


l  M  +  Kxo{x,  a:+) 


P 


P 


1  +  ——(2  M  +  K1o(x,x+)) 
mm 


+  — —  ]  Kx  o(x,x+ ) 
mm 


<  !-»!{[ 


M  -f-  K j^2 
1? 


1  +  -^(2  M  +  Kfy) 
m 


M 


+  — ]  Kx}  o(x,x+) 


m 


=  I3  WB  I 


Now,  using  (2.16),  and  the  bounds  on  fli?!  ||F,  \\E2  |f,  and  fE'g  |F,  we  have 
II B+-B'  ||f  <  [  7i  +  0(272  +  I3)  S#  l]o{x,  x+) 

<  [ 7i  +  074(  lB  -Bt  i  +  S Bt  I)]  o(x,x+ ) 

<  [ (7i  +  <h4M)  +  074  | B -B*  ||F  \o{x,x+) 


where  74  =  272  +  73  ;  hence, 
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where 


<  I B.-'I*V  I B+-B  \f 

<  [oq  \B  —B'  |  *  +  a2  ]  (fix,x+) 


(2.18a) 


<^74M 

a,  = -  and  a0 

m  J 


h  +  <hl4M 
m 


(2.18b) 


Finally,  the  triangle  inequality  and  inequalities  (2.15)  and  (2.18)  give  us  the 
bounded  deterioration  inequality  (2.1)  as  follows: 

1 B+-B*  I*  <  \\B+-B'\*  +  || B'-B.  I* 

<  [aj  IB  —  J5*  |*  +  a2]  o(x,  ar+)  +  ||B  —  B*  ||*  (2.19) 

=  [1  +  ayO{x,x+))  || 5  —B*  II  +  a2a{ x,x+).  • 

Notice  that  the  stronger  form  of  bounded  deterioration  for  the  BFGS  secant 
update  is  a  consequence  of  the  fact  that  the  difference  between  B+  and  B'  does 
not  depend  on  B,  i.e.,  oq  =  0  if  0  ==  0. 


2.4.  Bounded  Deterioration  for  the  Structured  Secant  Approximations. 

Finally,  we  prove  an  analogous  result  for  the  structured  secant 
approximations  from  the  convex  class  defined  in  Section  1.1.2. 

THEOREM  2.5.  Suppose  that  Standard  Assumptions  A2  and  AS  hold.  Let 
B+  be  a  structured  secant  update  defined  in  Section  1.1.2,  i.e., 

B+=A  +  +  C{x+ )  (2.20a) 

where 

A+ =  A  +  A2  (s,y*  ,A,v),  (2.20b) 

s  =x+  —  x,  A 2  (s ,  y#  ,  A  ,  v)  is  given  by  (1.10),  the  scale  v=v(s,  y ,  B)  is  chosen 
such  that  A2  (s,  y ,  B,  v )  can  be  written  as  Afis  ,y  ,B  ,<fi)  for  some  </>E  [0, 1],  and  y 
and  y#  are  approximations  to  V2/(a:*)s  and  S(x*)s  respectively  such  that 


21 


y  —  y#  _  (7(z)s  for  some  x  £[x,  rr+].  If  there  exist  D ,  a  neighborhood  of  x*  and 
K2,  a  positive  constant,  such  that 

II 3/ #  —  5(x*)«  ||  <K2cr(x,x+)  ||s  I,  (2.21) 

for  x,  x+ED ,  then  the  bouiided  deterioration  inequality  (2.1)  holds  whenever 
x,x+£D3  —  D XC[  D2,  where  Dl  and  D2  are  given  in  Assumption  A2  and 
Proposition  2.1  (and  Theorem  2.4)  respectively. 

Proof.  Let  5*=V2/(a;#),  and  Bl=A+C(x).  Using  (2.20)  and  the 
following  simple  observation 

Ao  (s,y*  ,A,v)  —  A2  (s,y,Bl,v) 

we  have  for  x,x+ED 3 

B+  =  A+  +  C(x+) 

=  A  +  A  2  (s,y*  ,A,  v)  +  C(x+) 

—  A  +  A2  (s,  y,Bl,  v)  +  C{x+)  (2.22) 

=  51  —  C(x)  +  A 2  {s,y,B\,v)  +  C(x+) 

=  51  +  A2  (s,y,Bl,v)  +  C(x+)—  C(x)  . 

Since  Proposition  2.2  with  condition  (2.21)  allows  us  to  use  Theorem  2.4, 
and  51=5  +  C [x )  —  C  (a: ),  we  can  write 
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II B+-B*  ||*  <  |i?l  +  A2  {s,y,Bl,v)-B*  ||*  +  I C (x+)  —  C (x)  | * 

<  [1  +  axo(x,x+)]  || JEf  1  —  B*  || *  +  a2a(x,x+ ) 

+  Vn"  Lc  \Brl/2  ||2(  Ix+-x*  |+  \x-x*  |  ) 

<  [1  +  alo(x,x+))  [  \B  -Bt  1*  +  | C(x)  —  C(x)  ||*  ]  +  a2cr(x,x+) 

2  Vn"  Lc 

+ - aix,x.) 

m 

<  [1  +  a1o{x,x+)]  ||.0  -  B *  ||*  +a2a(x,x+) 

2  Vn  Lr 

+  [2  +  o^] - o{x ,  x+) 


=  [1  +  atio{x,x+)\  ||fi  - B *  |*  +a3o-(:r,:r+), 


2  Vn"  Lc 

which  is  (2.1)  with  a^  =  a2-\ - [2  +  a.e,]. 

~  m 

(2.18b)  in  Theorem  2.4. 


and  av  a2  are  given  by 


3.  Local  Convergence  Theory. 

In  this  section  we  will  establish  the  local  and  ^-superlinear  convergence  of 
the  structured  secant  methods  from  the  convex  class  defined  in  Section  1.1.2. 
Our  approach  will  be  to  use  the  results  of  Section  2  and  the  Broyden-Dennis- 
More’  theory  to  prove  the  locally  q -linear  convergence.  Then,  we  will  use  (2.22) 
and  Proposition  4  of  Griewank  and  Toint  [1982]  to  obtain  the  g-superlinear 
convergence.  For  completeness  we  restate  the  Griewank- Toint  proposition  as 
follows. 

PROPOSITION  3.1  (Griewank  and  Toint  [1982]).  Suppose  that  Standard 
Assumptions  Al,  A2  and  AS  hold.  Let  { x }  be  a  sequence  which  converges  to  x* 
and  satisfies 
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S  Ik  -**  I  <  oo .  /o  n 

k  >0  V  ' 

A/so,  /or  an  arbitrary  sequence  of  <f>’s  in  [0,  l]  let  {Bk  },  the  approximations  to  the 
Hessian,  be  generated  by 


B+=  B  +  A  j(s  ,y  ,B  ,<f>)  (3.2) 

where  Ax(s  ,y  ,B  ,f)  is  the  secant  update  correction  given  by  (1.7),  starting  with  a 
symmetric  positive  definite  matrix  B0.  Then,  {xk}  converges  q-superlinearly  to 
x *;  equivalently,  {Bk  }  satisfies  the  Dennis-More'  characterization 


lim 

k 


pn 


(3.3) 


The  next  theorem  gives  sufficient  conditions  to  insure  local  and  9-superlinear 
convergence  for  any  structured  secant  method  from  the  convex  class. 

THEOREM  3.2.  Suppose  that  Standard  Assumptions  Al,  A2  and  AS  hold.  If 
s  =  xl—x2  and  y*  is  an  approximation  to  S(x*)s  satisfying 

ll//#  ~S(x*)s  ||  <  IC2a(x  1,x2)  ||s  |  (3.4) 

for  xv  x2&D  and  some  K2  >  0,  then  there  exist  positive  constants  e,  6  such  that, 
for  Xq  E  IRb  and  symmetric  A0GlRnXn  satisfying  ||x0  —  x*  \  <  e  and 
||A0-S(x*)  ||  <  8,  the  sequence  {a:^}  generated  by  any  structured  secant  method 
from  the  convex  class  for  problem  (l.l)  is  q-superlinearly  convergent  to  x *. 

Proof.  As  was  the  case  in  Dennis  and  Walker  [1981],  the  local  g-linear 
convergence  is  a  straightforward  application  of  bounded  deterioration  (Theorem 
2.5  in  this  case)  and  the  standard  Broyden-Dennis-More'  theory. 

Let  B *  =V2/(x*)  and  A*  =S{x *).  Since  B*  is  positive  definite,  there  exist 
neighborhoods  N1  of  x *  and  N2  of  B*  which  are  sufficiently  small  so  that 
NlC.D 3  (see  Theorem  2.5),  N2  contains  only  positive  definite  matrices  and 
x+  € D 3  for  every  (x,  B)  £ Nx  X  N2.  Now,  choose  a  neighborhood  N3  of  A*  and 
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restrict  N1  as  needed  so  that  (x,  A  )  E  N  =  X  iV3  implies  that  A  +  C(x)E  N2. 

Theorem  2.5  allows  us  to  use  Theorem  3.2  of  Broyden,  Dennis  and  More 
[1973]  to  prove  that  {a^}  converges  ^-linearly  to  x*.  Finally,  (2.22)  allows  us  to 
use  Proposition  3.1  to  prove  the  theorem.  • 

4.  Applications. 

In  this  section  we  use  the  results  of  Sections  2  and  3  to  establish  the  local 
and  <7-superlinear  convergence  of  any  structured  secant  method  from  the  convex 
class  for  the  constrained  optimization  problem  and  the  nonlinear  least-squares 
problem.  Particular  cases  of  these  methods  are  the  SQP  augmented  scale  BFGS 
and  DFP  secant  methods  for  constrained  optimization  problems  suggested  by 
Tapia  [1984].  Another  particular  case,  for  which  local  and  (/-superlinear 
convergence  is  proved  for  the  first  time  here,  is  the  Al-Baali  and  Fletcher  [1985] 
modification  of  the  structured  BFGS  secant  method  considered  by  Dennis,  Gay 
and  Welsch  [1981]  for  the  nonlinear  least-squares  problem  and  implemented  in 
the  current  version  of  the  NL2SOL  code. 

4.1.  Nonlinear  Least  Squares. 

Our  presentation  of  the  nonlinear  least-squares  problem  follows  Chapter  10 
of  Dennis  and  Schnabel  [1983].  The  nonlinear  least-squares  problem  is 

1  T  1  m 

minimize  f  {x)  =  —R(x)TR(x)  =  —  E  ri(xf  (4.1) 

x  L  w  i=i 

where  m  >n,  the  residual  function  R:JRn  — ►IRm  is  nonlinear  and  r,(x)  denotes 
the  ith  component  function  of  R(x).  Straightforward  calculations  show  that  the 
gradient  of  /  is  given  by 

V/  (x)  =  J(x)tR  (x)  (4.2) 

where  J(x)  denotes  the  Jacobian  of  R  at  a;,  and  the  Hessian  of  /  is  given  by 
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V2/  (z)  =  <?(*) +  <?(*) 

where 

C{x)  =  J(x)TJ(x), 

m 

SW-E'.WV'r.W, 

1=1 

and  V2r,  (a:)  is  the  Hessian  of  ri  at  x. 

4.1.1.  The  Structured  Secant  Method. 

By  a  structured  secant  method  for  the  nonlinear  least-squares  problem  (4.1) 
we  mean  the  iterative  procedure 

x+  =  i+s 

A+  =  A  +A2{s,y*  ,A,v)  (4.5) 

B+  —  A  +  +  C(a:+) 

where  s  is  the  quasi-Newton  step  defined  by 

Bs  =  —  V/  (x)  .  (4.6) 

In  (4.5),  A  is  an  approximation  to  S(x),  A2(s,y^,A,u)  is  the  secant  update 
correction  given  by  (1.10)  with  v=v(s,y,B),  and  y  and  y#  are 
approximations  to  V2/(a:*)s  and  5(a;*)s  respectively. 

The  choice  for  y^ 

V*  =  [  J{x+)-J{x)  ]  TR(x+)  (4.7) 

was  suggested  independently  by  Dennis  (1976)  and  Bartholomew-Biggs  (1977) 
and  is  currently  used  in  the  algorithms  given  by  Dennis,  Gay  and  Welsch  (1981) 
and  Al-Baali  and  Fletcher  (1985).  Initially,  Dennis,  Gay  and  Welsh  (1981)  used  in 
the  NL2SOL  code 


(4.3) 


(4.4) 


26 


y  =  V/(x+)  -  V/(x)  (4.8) 

to  compute  the  scale  v.  It  was  Al-Baali  and  Fletcher  (1985)  who  first  suggested 
using 

y  =  y*  +  J(x+)T  J(x+)$  (4.9) 

instead  of  (4.8)  to  compute  v,  introducing,  in  this  way,  the  structure  of  the 
problem  into  the  scale  of  the  update  formula.  This  modification  improved  the 
numerical  performance  of  the  NL2SOL  code  (Dennis  [1987]). 

4.1.2.  Standard  Assumptions. 

Consider  the  following  standard  assumptions  for  problem  (4.1). 

Al:  Problem  (4.1)  has  a  solution  x*. 

A2:  The  function  /  E  C 2,  and  J  and  V2/  are  locally  Lipschitz  continuous  at  x *, 
i.e.,  there  exist  Lv  L2,  and  e  such  that 


||  J{x)  —J(x*)  1  <  L1\\x  —x*  | 

(4.10a) 

and 

II V2/  (x)  —  V2/  (x*)  ||  T2  jx  —  x*  II 

(4.10b) 

for  x  £D  ={x:  |x  —  x*  |  <  e}. 

A3:  The  matrix  V2/  (x*)  is  nonsingular. 

4.1.3.  Local  Convergence  Theory. 

The  following  lemma  will  serve  as  the  foundation  of  our  convergence  result 
for  the  nonlinear  least-squares  problem  (4.1). 

LEMMA  4.1.  Suppose  that  the  standard  assumptions  for  problem  (4-1)  hold. 
Then  there  exists  a  positive  constant  K  such  that 
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\y#  —  S(x*)s  |  <  Ka(x,  x+)  ||s  g  (4.11) 

where  y *  is  given  by  (4-7),  x,x+£D ,  and  s  =x+  —  x. 

Proof.  Observe  that  by  adding  and  subtracting  the  appropriate  terms  we 

have 

y *  —S(x*)s  =  J(x+)TR(x+)  —  J(x)TR(x+)  —  S(x*)s 

=  V/(x+)  —  V/  (x)  —  J(x)r[R(x+)  —  R(x)  —  J(a:*)s]  (4.12) 
—  [J{x)~  J(x*)]T  J(x*)s  —  V2f(x*)s  . 

From  (4.10)  and  Lemma  4.1.15  in  Dennis  and  Schnabel  [1983]  we  have 

BV/(a:+)  —  Vf(x)  — V2f(x*)s  g  <  L2cr(x,  x+)  |s  ||  (4.13a) 

and 

||-ff(3:+)—  R(x)  —  J(x*)s  g  <  Llcr{x,x+)  g s  ||  .  (4.13b) 

Therefore,  using  (4.12)  and  (4.13) 

g V*  ~S{x*)s  g  <  L2cr(x,x+)  fls  g  +  g/(x)  | Llo{x,x+)  gs  g 

+  IUM  l^i  8*  II  h  II 

^  [L>2  +  (Lle+L*)L1  +  L*LX]  u(x,  x+)  ||s  g 
where  L*  =  II J (x* )  g .  • 

THEOREM  4.2.  Suppose  that  the  standard  assumptions  for  problem  (4-1)  hold. 
Then ,  there  exist  positive  constants  e,  8  such  that,  for  a^ElR”  and  symmetric 
AqEIR"  satisfying  ||a:0  — a;*  |  <  e  and  \A  q  —  S{x*)  |  <  8,  the  iteration  sequence 
{xk  }  generated  by  any  structured  secant  method  from  the  convex  class  for  problem 
(4-1)  is  q-superlinearly  convergent  to  x* . 

Proof.  The  proof  of  this  theorem  is  a  straightforward  application  of 
Theorem  3.2  and  Lemma  4.1.  • 


28 


(4.14) 


4.2.  Constrained  Optimization. 

We  will  consider  the  special  case  of  the  nonlinear  programming  problem 
where  we  only  have  equality  constraints.  Namely, 

minimize  f  (x) 

X 

subject  to  g(x)  =  0 
where  /  :IRn — dR,  and  <7:lRn — dRm  are  smooth  nonlinear  functions  (m<n). 
Associated  with  problem  (4.14)  is  the  Lagrangian  function 

l(x,\)  =  f(x)  +  g(x)T\.  (4.15) 

Straightforward  calculations  show  that  the  gradient  of  l  with  respect  to  x  is 
given  by 

Vxl(x,\)  =  Vf(x)  +  Vg(x)T\,  (4.16) 

and  the  Hessian  of  l  with  respect  to  x  by 


v,2((*,x)  =  WM+  £X,v29,(*), 


«'-l 


(4.17) 


where  ff,  :IRn — ►IR  denotes  the  ith  component  function  of  g . 


4.2.1.  The  SQP  Augmented  Scale  Secant  Method. 

Following  Tapia  [1984],  by  the  SQP  augmented  scale  secant  method  for  the 
constrained  optimization  problem  (4.14  ),  we  mean  the  iterative  process 


X+  =  X  +  s 

X+  =  X  +  AX 

B+  =  B  +  A2(s,  y,  B,  vL) 

(4.18) 

where  s  and  A\  are  respectively  the  solution  and  the  multiplier  associated  with 
the  solution  of  the  quadratic  programming  problem 
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minimize  Vr  /  (x  ,X) T  s  +  —  s  T  Bs 

b  x  K  ’  2  (4.19) 

subject  to  X?g(x)Ts  +  g(x)  =  0. 

In  (4.18),  B  is  a  symmetric  approximation  to  V2/(;r,X),  and  A2(s,  y,B,vL) 
is  the  secant  update  correction  given  by  (1.10),  where 

vL  =  v(s,yL,BL)  , 

(4.20) 

VL  =V  +  pVff(a:+)Vfir(z+)7’s  , 

Bl  =B  +  pVg(x+)Vg(x+)T 

and  p  is  the  penalty  constant  in  the  augmented  Lagrangian  function 

L(x,\;p)  =  l(x,\)  +  ^-pg(x)T  g(x)  p>  0.  (4.21) 

Observe  that  BL  is  a  structured  approximation  to  the  Hessian  of  the 
augmented  Lagrangian  at  the  solution,  i.e., 

BL  w  p)  =  V%l(x*,\*)  +  pVg(x*)Vg(x*)T  (4.22a) 

since  the  last  term  of 

Vz2L  (x  ,  X;  p)  =  V2/  (x , X)  +  pVg  (x  )Vg  (x ) T  +  pf]  g{  (x ) V  2g{  (x )  (4.22b) 

i=i 

vanishes  at  the  solution  x*.  Moreover,  Tapia  [1984]  gave  strong  arguments  to 
blame  this  second-order  term  for  the  poor  numerical  performance  of  the  SQP 
augmented  Lagrangian  secant  method  for  large  values  of  p. 

Three  issues  are  important  in  the  derivation  of  the  SQP  augmented  scale 
secant  method.  First  of  all,  consider  the  augmented  Lagrangian  instead  of  the 
standard  Lagrangian  to  compensate  the  lack  of  positive  definiteness  of 
V2l(x*,\t).  Secondly,  use  the  structure  of  V2L(x*,X#;  p)  as  much  as  possible. 
Finally,  observe  that  the  penalty  constant  cancels  out  in  all  parts  of  the 
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algorithm  except  in  the  scale  of  the  secant  update. 

In  fact,  the  SQP  augmented  scale  secant  method  is  an  SQP  (standard) 
Lagrangian  secant  method  with  a  modified  (or  augmented)  scale.  It  is  this  change 
of  scale  which  takes  care  of  the  lack  of  positive  definiteness  in  the  Hessian  of  the 
Lagrangian  and  allows  us  to  use  positive  definite  secant  updates,  like  the  ones 
from  the  convex  class,  for  constraint  optimization  problem  (4.14)  without 
assuming  that  V2/(x*,X*)  is  positive  definite. 

Clearly,  since  y T  s  is  not  necessarily  positive,  the  augmented  scale  secant 
updates  in  (4.18)  do  not  have  the  hereditary  positive  definiteness  property. 
However,  they  do  possess  this  property  on  N(x+)  where 

N(x)  =  {  2  6  IRn:  Vg(x)Tz  —  0  }  (4.23) 

(Proposition  4.4  in  Tapia  [1984]). 

4.2.2  Standard  Assumptions. 

The  following  are  standard  assumptions  in  the  theory  of  quasi-Newton 
methods  for  problem  (4.14). 

Al:  Problem  (4.14)  has  a  solution  x*  with  associated  multiplier  X*. 

A2:  The  functions  /  and  </,•,  i  =  1,  ...,  m  have  second  derivatives  which  are 
locally  Lipschitz  continuous  at  x*,  i.e.,  there  exist  L,  L,  ,  i  =  1,  ...,  m  and  e 
such  that 

||V2/(x)-V2/(x,)||  <L  || x  —  x*  ||  (4.24a) 

and 

||V2tf,-(x)  -V2gt(x*)  ||  <  L,  [x  -x*  ||  i  =  1,  ...,  m  (4.24b) 
for  xEZ>={x:  ||x— x*  |  <  e}. 

[v2/(x„X,)  V<7(x,)j 

A3:  The  matrix  V2/(x*,X*)  =  is  nonsingular. 

Vg(x*)T  0 
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In  the  next  section,  we  will  use  the  following  well-known  results: 

RESULT  4.3.  Suppose  Assumption  A1  holds.  Then  Assumption  A3  is 
equivalent  to  the  following  two  statements: 

A3’a:  The  matrix  Vg(x*)  has  full  rank. 

A3’b:  The  matrix  Vfl(x*,\*)  is  positive  definite  on  the  subspace  N(x *),  where 
N(x)  is  given  by  (4- 23). 

RESULT  4.4.  Suppose  that  the  standard  assumptions  for  problem  (4-14)  hold. 
Then  there  exists  p *  such  that  p)  is  positive  definite  for  any  p>p# 

(See  Corollary  12.9  and  Theorem  12.10  of  Avriel  [1976]). 

4.2.3.  Local  Convergence  Theory. 

Tapia  [1984]  used  the  Fontecilla-Steihaug-Tapia  [1987]  and  Broyden- 
Dennis-More  [1973]  theories  to  prove  that,  under  the  standard  assumptions,  the 
SQP  augmented  scale  BFGS  and  DFP  secant  methods  were  locally  and  q- 
superlinearly  convergent  to  x *.  In  this  section,  we  will  use  a  similar  approach  to 
generalize  this  result  to  any  SQP  augmented  scale  secant  method  from  the  convex 
class.  The  main  difference  in  our  approach  is  the  unified  way  in  which  we  obtain 
the  bounded  deterioration  inequality  for  all  the  augmented  scale  secant  updates 
from  the  convex  class.  Indeed,  this  inequality  follows  from  Theorem  2.5  and  the 
following  lemma. 

LEMMA  4.5.  Suppose  that  the  standard  assumptions  for  problem  (4-14)  hold. 
Then  there  exists  a  positive  constant  K  such  that 

II V  —Vfl(x*,\*)s  ||  <Kcr(x,x+)\s  ||  (4.25) 

where  y  is  given  by  (4-20),  x,x+£D ,  and  s  =x+—x. 

Proof.  Observe  that  by  adding  and  subtracting  the  appropriate  term  we 


have 
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V -V?l(x„\,)s  =  VJ(z+,\)-V,l(x,\+)-V?l(x,X)s 

=  V/  (i+)  +  Vp  (i+)X+  —  V/  (i )  —  Vj  (i  )X+  — 

-  m 

-v2/  (^)s-EXiV2ff,(^)s 
1-1 

=  v/  (x+)  -  V/  (x )— V2/  (x,)s  + 

(4.26) 

m 

+  21  ^.•(^+)-^7?,(a:)-V2gf1(x*)s  ]  XJ  + 

i-l 

m 

+  El  Vff,-(a?+)-Vjr,-(*)  — V2jr,-(a?*)«  ]  [Xj-Xj]  + 

1-1 

m 

+  El  1  V2y,-(a?*)* 

i=i 

where  X+  and  X*  are  the  ith  component  of  X+  and  X*  respectively. 

From  (4.24)  and  Lemma  4.1.15  in  Dennis  and  Schnabel  [1983]  we  have 

||V/(a:+)-V/(x)-V2/(a:*)s  j|  <La(x,x+)\\s  \  (4.27a) 

and 

S (z+) -  Vg{ (x)-V 2g{ (x* )s  ||  <  L,a(x,  x+)  \s  ||  i=  1,  m  (4.27b) 

Therefore,  using  (4.26),  (4.27  )  and  the  Cauchy-Schwarz  inequality 
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I y -V*l(xt,\t)s  ||  <  L  (fix ,x+)  | s  |+  f)L,-  |XJ  \o(x,x+)\8  ||  + 

1-1 

m  m  _ 

+  E Li  |<7(a;,a:+)  ||s  ||  +  E Li  |Xj-Xj  |  |s  || 

i'-i  »-i 

(4.28) 

m 

<  [  L  +  E4  |Xi  |  ]a(x,  x+)  ||s  |  + 

i—i 

m  .  m  _ 

+  [(£4-2)1/2  £  +  (S4-2)1/2]  P+-X.  1 1«  II 

t'-l  »-l 

where  L{  =  || V^g^x*)  |. 

From  Proposition  (4.2)  in  Fontecilla,  Steihaug  and  Tapia  [1987]  we  have 
that  there  exists  a  positive  constant  7  such  that 

i X+  X #  ||  <  7  \x  —x*  |  (4.29) 

for  all  x  close  enough  to  x*. 

Therefore,  using  (4.28)  and  (4.29),  we  establish  (4.25)  with 

m  m  m 

K  =  L  +  £  £,.  |\j  |  +  7  [  ( 2  £,.*)*/*  £  +  ( £  |  .  (4.30) 

1=1  1=1  1-1 

THEOREM  4.6.  Suppose  that  the  standard  assumptions  for  problem  (4-14) 
hold  and  p  >  0  has  been  chosen  so  that  VjL(a:*,X*;  p)  is  positive  definite  (see 
Result  4-4)-  Then,  there  exist  positive  constants  e,  8  such  that,  for  ZqEIR”  and 
symmetric  B0£]Rn  satisfying  \xQ  —  x*  ||  <e  and  | B0  —  V%l(x*,\*)  ]  <  8,  the 
iteration  sequence  {a^}  generated  by  any  SQP  augmented  scale  secant  method 
from  the  convex  class  is  q  -superlinearly  convergent  to  x* . 

Proof.  This  proof  is  similar  to  the  one  given  by  Tapia  [1984]  for  the  SQP 
augmented  scale  DFP  secant  method.  The  following  can  be  seen  as  a 
generalization  of  that  result. 
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First  of  all,  let  us  remember  that  the  quadratic  problem  (4.19)  would  have 
the  same  solution  if  we  use  BL  and  VxL{x,  X;  p)  instead  of  B  and  Vxl(x,\) 
respectively  (Proposition  3.1  in  Tapia  [1984]).  Now,  the  bounded  deterioration 
inequality  for  BL,  the  structured  secant  approximation  to  X7xL(x*,\*;  p )  follows 
from  Lemma  4.5  and  Theorem  2.5  for  any  augmented  scale  secant  update  from 
the  convex  class.  In  turn,  this  bounded  deterioration  inequality  allows  us  to  use 
Theorem  3.1  in  Fontecilla,  Steihaug  and  Tapia  [1987]  to  establish  the  existence  of 
the  constants  e,  8  and  the  q -linear  convergence  of  the  sequence  {art}.  Then,  using 
an  argument  identical  to  the  one  used  by  Broyden,  Dennis  and  More"  [1973],  we 
can  prove 


lim 

k 


=  o . 


(4.31) 


Finally,  the  g-superlinear  convergence  follows  from  Corollary  5.4  in  Fontecilla, 
Steihaug  and  Tapia  [1987]. 
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